Search CORE

12 research outputs found

Multilingual Language Processing From Bytes

Author: Brunk Cliff
Gillick Dan
Subramanya Amarnag
Vinyals Oriol
Publication venue
Publication date: 01/01/2016
Field of study

We describe an LSTM-based model which we call Byte-to-Span (BTS) that reads text as bytes and outputs span annotations of the form [start, length, label] where start positions, lengths, and labels are separate entries in our vocabulary. Because we operate directly on unicode bytes rather than language-specific words or characters, we can analyze text in many languages with a single model. Due to the small vocabulary size, these multilingual models are very compact, but produce results similar to or better than the state-of- the-art in Part-of-Speech tagging and Named Entity Recognition that use only the provided training datasets (no external data sources). Our models are learning "from scratch" in that they do not rely on any elements of the standard pipeline in Natural Language Processing (including tokenization), and thus can run in standalone fashion on raw text

arXiv.org e-Print Archive

Crossref

Metal Fluorides as Analogs for Studies on Phosphoryl Transfer Enzymes

Author: Admiraal
Anderson
Antony
Batsanov
Baur
Baxter
Baxter
Baxter
Baxter
Berente
Bigay
Blackburn
Blackburn
Bock
Boone
Bowler
Bruice
Brunk
Burgos
Cavalli
Cleland
Cliff
Coleman
Du
Field
Fisher
Fovet
Frey
Geerlings
Graham
Graham
Griffin
Grigorenko
Grigorenko
Hemsworth
Henry
Higashijima
Hilbert
Himo
Hoffman
Hur
Issartel
Jackson
Jin
Jin
Jin
Kamerlin
Khrenova
Klink
Klähn
Knowles
Kowalinski
Lahiri
Lassila
Leigh
Lienhard
Liu
Lowe
Madhusudan
Maegley
Marcos
Martin
Martin
Maruta
Mesmer
Mildvan
Oldfield
Park
Pauling
Plotnikov
Praefcke
Prasad
Pylypenko
Pérez-Gallegos
Pérez-Gallegos
Ribeiro
Rittinger
Scheffzek
Schlichting
Schlichting
Senn
Sheftic
Shinoda
Shurki
Siegbahn
Smith
Sondek
Sośnicki
Sternweis
Thorsell
Topol
Toyoshima
Valiev
Wang
Warshel
Webster
Wolfenden
Zhang
Publication venue: 'Wiley'
Publication date: 03/04/2017
Field of study

The 1994 structure of a transition state analog with AlF4- and GDP complexed to G1, a small G protein, heralded a new field of research into structure and mechanism of enzymes that manipulate transfer of the phosphoryl (PO3-) group. The list of enzyme structures that embrace metal fluorides, MFx, as ligands that imitate either the phosphoryl group or a phosphate, is now growing at over 80 per triennium. They fall into three distinct geometrical classes: (i) Tetrahedral complexes, based on BeF3-, mimic ground state phosphates; (ii) Octahedral complexes, primarily based on AlF4-, mimic "in-line" anionic transition state for phosphoryl transfer; and (iii) Trigonal bipyramidal complexes, represented by MgF3- and putative AlF30 moieties, additionally mimic the tbp geometry of the transition state. The interpretation of these structures provides a deeper mechanistic understanding of the behavior and manipulation of phosphate monoesters in molecular biology. This review provides a comprehensive overview of these structures, their uses, and their computational development. It questions the identification of AlF30 and MgF4= as tbp species in protein complexes and discusses the relevance of physical organic chemistry and water-based model studies for understanding phosphoryl group transfer in enzymes. It describes two roles for amino acid side-chains that mediate proton transfers during phosphoryl transfer, based on the analysis of protein/MFx structures. First, they deploy hydrogen bonding to neutral oxygen nucleophiles so as to orientate them for correct orbital overlap with the electrophilic phosphorus center. Secondly, they behave as classical general acid/base catalysts

Crossref

Online Research @ Cardiff

The University of Manchester - Institutional Repository

White Rose Research Online

D2.2.5 MineSet TM

Author: Cliff Brunk
Ron Kohavi
Publication venue
Publication date
Field of study

MineSetTM is a commercial data mining product from Silicon Graphics. It provides an interactive platform for data mining, integrating three powerful technologies: database and file access, analytical data mining engines, and data visualization. MineSet supports the knowledge discovery process from data access and preparation through iterative analysis and visualization to deployment. MineSet uses a clientserver architecture for scalability and support of large data. The data access component provides a rich set of transformations that can be used to process stored data into forms appropriate for visualization and analytical mining. MineSet’s 2D and 3D visualization capabilities allow direct data visualization for exploratory analysis. The analytical mining algorithms create models that can be viewed using visualization tools specialized for the learned models or deployed as part of a larger system. Third party vendors can interface to the MineSet tools for model deployment and for integration with other packages

CiteSeerX

Pruning Decision Trees with Misclassification Costs

Author: Bradford Jeffrey P.
Brodley Carla E.
Brunk Cliff
Kohavi Ron
Kunz Clayton
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/1998
Field of study

decision tree classifiers in two learning situations: minimizing loss and probability estimation. In addition to the two most common methods for error minimization, CART\u27S cost-complexity pruning and C4.5\u27~ errorbased pruning, we study the extension of cost-complexity pruning to loss and two pruning variants based on Laplace corrections. We perform an empirical comparison of these methods and evaluate them with respect to the following three criteria: loss, mean-squared-error (MSE), and log-loss. We provide a bias-variance decomposition of the MSE to show how pruning affects the bias and variance. We found that applying the Laplace correction to estimate the probability distributions at the leaves was beneficial to all pruning methods, both for loss minimization and for estimating probabilities. Unlike in error minimizat,ion, and somewhat surprisingly, performing no pruning led to results that were on par with other methods in ternis of the evaluation criteria. The main advantage of pruning was in the reduction of the decision tree size, sometimes by a factor of 10. While no method dominated others on all datasets, even for the same domain different pruning mechanisms are better for different loss matrices. We show this last result using Receiver Operating Characteristics (ROC) curves

CiteSeerX

Purdue E-Pubs

Metallfluoride als Analoga für Studien an Phosphoryltransferenzymen

Crossref